It usually starts with a celebratory email: “We secured the GPUs.” And then the program still slips.
In AI infrastructure, the most expensive component is rarely the one that sets the schedule. Teams can secure accelerators and still miss the launch date because a transformer delivery moved by 12 weeks, or a tranche of optics failed qualification. Recent industry reporting suggests flagship AI GPU/system lead times have eased into roughly about 8 to 16 weeks in some channels, while high-voltage electrical equipment remains the long pole—for example switchgear is approximately 45 to 80 weeks, and large power transformers can range about 80 to 210 weeks.
For procurement and capacity planners, this is the new reality: an AI “cluster” isn't just a purchase order—it’s a high-stakes orchestration of a global supply chain where the critical path is often hidden in the "boring" parts.
The real problem: AI capacity is a system, not a SKU
AI programs fail to launch on time when procurement and engineering treat the bill of materials (BOM) like a checklist. In practice, the BOM behaves like a complex project network diagram. Some items are interchangeable; others are absolute bottlenecks that create a cascading failure of delays.
A good mental model is to divide the AI cluster into four layers, each with its own failure modes:
The AI cluster BOM (a procurement lens)
Compute: accelerators are necessary, not sufficient
This includes accelerators (GPU/TPU), hosts (CPU, memory), storage, firmware, and support commitments. The risk is well known: allocation constraints, configuration churn, and ecosystem lock-in.
- The procurement shift: Stop buying parts; start buying capacity on a timeline. Ensure your contracts include enforceable delivery cadences and clear substitution paths.
Networking: the quiet schedule killer
High-speed networks drive performance—and delays. Switches, NICs/DPUs, fiber/cabling, and especially optics/transceivers can become gating constraints.
- The procurement shift: Move beyond “generic inventory.” Do you have qualified alternates and enough spares for your specific topology? Interoperability issues usually surface during the final 10% of the build, when the cost of a change-order is highest.
Power: your longest lead-time supplier
Transformers, switchgear, breakers, busways, UPS, generators, PDUs—and, crucially, commissioning services. Many teams discover too late that “power” is a supply chain with factory slots, specialized labor, and dependencies on utilities and permitting.
- The procurement shift: Secure the “power chain” and commissioning windows before the compute even leaves the factory. There is nothing more expensive than "stranded compute" sitting in a dark data center.
Cooling & mechanical: density makes it strategic
As rack densities climb, cooling moves from “facilities” to “core strategy.” Liquid cooling introduces a complex new ecosystem: CDUs, cold plates, leak detection and service model.
Operator survey data shows this shift is already underway: Uptime reports 22% of respondents are making some use of direct liquid cooling and 61% would consider it.
- The procurement shift: You aren't just buying hardware; you’re contracting for reliability. Focus on the service models and the availability of specialized spares.
What procurement should lock first: the critical path order
If procurement wants the program to launch on time, the sourcing sequence needs to mirror the schedule’s critical path. In many AI deployments, the early locks should be:
- Power chain + commissioning capacity. Factory slots, commissioning labor, and long lead equipment must be reserved early. Without power readiness, everything else becomes stranded inventory.
- Networking (optics + switching) and qualification. Optics and topology-specific components should be treated as “gating parts,” not accessories. Build a spare strategy and pre-qualify substitutes.
- Cooling architecture decisions. Air vs. liquid isn’t a facilities detail—it drives rack specs, maintenance models, and lead times. Decide early and contract the operational model (spares, SLAs, response times).
- Accelerator supply agreements (with enforceable cadence). Yes, GPUs matter. But the contract needs to reflect reality: allocation protections, delivery rhythm, and substitution rules that don’t break the system design.
A useful test is simple: If a shipment arrived tomorrow, could you install and run it? If not, the “locked” items were not the right ones.
Contract clauses that prevent the usual “surprise delays”
Magazine readers won’t want legal boilerplate, so here are the practical levers that consistently reduce risk:
- Delivery cadence commitments (monthly/quarterly shipment windows)
- Pre-approved substitutions for high-risk components (optics, breakers, specific cables)
- Interoperability ownership (who is accountable when multi-vendor integration fails)
- Commissioning/service SLAs (response times, escalation paths, spares availability)
- Evidence-based lead times (factory slot reservations and written capacity confirmations)
These clauses do one thing: they turn “best effort” into a plan you can run.
The KPI reset: measure time-to-compute, not unit cost
Traditional procurement metrics—unit price, savings versus last year—don’t map cleanly to AI infrastructure. The opportunity cost of an idle high-end AI cluster can reach six figures per day quickly—and for very large deployments can approach (or exceed) seven figures depending on the workload economics.
The KPI that matters is time-to-compute: the date a usable cluster is available for workloads. To support that, procurement should also track:
- Time-to-power readiness
- Fill rate for gating components (optics, breakers, CDUs)
- Change-order frequency (a proxy for spec churn and poor upstream alignment)
Three takeaways for procurement leaders
- Stop buying parts—buy readiness. Lock the power chain, commissioning windows, and topology-critical networking early so compute doesn’t become stranded inventory.
- Treat the BOM like a supply chain with an intertwined critical path. Generally, the bottleneck isn’t the most expensive component; it tends to be the one with the longest lead time and lowest substitutability.
- Rewrite contracts for reality. Delivery cadence, substitutions, interoperability accountability, and service SLAs are what turn AI infrastructure from a risky build into a predictable program.
In AI infrastructure, procurement doesn’t just control cost—it controls whether capacity exists when the business needs it. The teams that win are the ones who source the critical path first, engineer options into contracts, and manage the AI BOM as a system.
About the author
Shilen Jhaveri works in engineering program management at a leading technology company, focusing on AI infrastructure and global supply chains. His experience spans infrastructure planning and the coordination of complex supply networks that support large-scale compute and power needs. Prior to his current role, he helped major enterprises design and implement tools to improve supply chain visibility and logistics performance. He regularly engages in industry conversations on AI infrastructure, data center planning and supply chain topics.
SC
MR

More AI Data Centers
Explore
Topics
Procurement & Sourcing News
- PepsiCo moves its startup sustainability program from pilots to operational scale across Asia Pacific
- Eli Lilly’s Mar Gimeno to keynote at NextGen Supply Chain Conference 2026
- From orbit to operations: Winning the race for the earliest disruption signal
- Stop moving boxes, start moving dollars: The new math of global supply chain velocity
- Finding your rhythm: SME supply chain footwork when the rules keep changing
- Supply chain’s new normal isn’t stability, it’s change
- More Procurement & Sourcing
Latest Procurement & Sourcing Resources

Subscribe

Supply Chain Management Review delivers the best industry content.

Editors’ Picks
